<scp>FinBERT</scp>: A Large Language Model for Extracting Information from Financial Text*

نویسندگان

چکیده

ABSTRACT We develop FinBERT, a state‐of‐the‐art large language model that adapts to the finance domain. show FinBERT incorporates knowledge and can better summarize contextual information in financial texts. Using sample of researcher‐labeled sentences from analyst reports, we document substantially outperforms Loughran McDonald dictionary other machine learning algorithms, including naïve Bayes, support vector machine, random forest, convolutional neural network, long short‐term memory, sentiment classification. Our results excels identifying positive or negative algorithms mislabel as neutral, likely because it uses text. find FinBERT's advantage over Google's original bidirectional encoder representations transformers model, is especially salient when training size small texts containing words not frequently used general also models discussions related environment, social, governance issues. Last, approaches underestimate textual informativeness earnings conference calls by at least 18% compared FinBERT. have implications for academic researchers, investment professionals, market regulators.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Financial Information from Text Documents

The majority of electronic data today is in textual form. Financial data such as articles in the Wall Street Journal are written as texts. These electronic documents contain a wealth of information but require human interpretation. For financial analysis, rapid up-to-date information is critical. Most software tools currently require data which are better structured than text (such as data in r...

متن کامل

FASTUS: A System for Extracting Information from Natural-Language Text

FASTUS is a system for extracting information from free text in English, and potentially other languages as well, for entry into a database, and potentially for other applications. It works essentially as a cascaded, nondeterministic finite state automaton. There are four steps in the operation of FASTUS. In Step 1 sentences are scanned for certain trigger words to determine whether further pro...

متن کامل

Building Large Scale Text Corpus for Tibetan Natural Language Processing by Extracting Text from Web Pages

In this paper, we propose an approach to build a large scale text corpus for Tibetan natural language processing. We find the distribution of Tibetan web pages on the internet with a crawler which can identify whether or not a web page contains Tibetan text. Three biggest web sites are selected, and topic pages are selected with a rule based method by checking the url. The layout structures of ...

متن کامل

FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text

FASTUS is a system for extracting information from natural language text for entry into a database and for other applications. It works essentially as a cascaded, nondeterministic nite-state automaton. There are ve stages in the operation of FASTUS. In Stage 1, names and other xed form expressions are recognized. In Stage 2, basic noun groups, verb groups, and prepositions and some other partic...

متن کامل

Extracting Semantic Representations from Large Text Corpora

Many connectionist language processing models have now reached a level of detail at which more realistic representations of semantics are required. In this paper we discuss the extraction of semantic representations from the word co-occurrence statistics of large text corpora and present a preliminary investigation into the validation and optimisation of such representations. We find that there...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Contemporary Accounting Research

سال: 2023

ISSN: ['1911-3846', '0823-9150']

DOI: https://doi.org/10.1111/1911-3846.12832